Cross validation for the \(\alpha\)-k-NN regression for compositional response data.
aknnreg.tune(y, x, a = seq(0.1, 1, by = 0.1), k = 2:10, apostasi = "euclidean",
nfolds = 10, folds = NULL, seed = FALSE, B = 1, rann = FALSE)
A matrix with the compositional response data. Zeros are allowed.
A matrix with the available predictor variables.
A vector with a grid of values of the power transformation, it has to be between -1 and 1. If zero values are present it has to be greater than 0. If \(\alpha=0\) the isometric log-ratio transformation is applied.
The number of nearest neighbours to consider. It can be a single number or a vector.
The type of distance to use, either "euclidean" or "manhattan".
The number of folds. Set to 10 by default.
If you have the list with the folds supply it here. You can also leave it NULL and it will create folds.
If seed is TRUE the results will always be the same.
If you want to correct for the optimistic bias set this to more than 1, otherwise no bootstrap bias correction takes place. If you have large sample sizes, say 1000 or more, bootstrap bias correction may not be really necessary.
If you have large scale datasets and want a faster k-NN search, you can use kd-trees implemented in the R package "RANN". In this case you must set this argument equal to TRUE. Note however, that in this case, the only available distance is by default "euclidean".
A list including:
The Kullback-Leibler divergence for all combinations of \(\alpha\) and k.
The Jensen-Shannon divergence for all combinations of \(\alpha\) and k.
The minimum Kullback-Leibler divergence.
The minimum Jensen-Shannon divergence.
The bootstrap bias corrected minimum Kullback-Leibler divergence.
The bootstrap bias corrected minimum Jensen-Shannon divergence.
The optimim \(\alpha\) that leads to the minimum Kullback-Leibler divergence.
The optimim k that leads to the minimum Kullback-Leibler divergence.
The optimim \(\alpha\) that leads to the minimum Jensen-Shannon divergence.
The optimim k that leads to the minimum Jensen-Shannon divergence.
The runtime of the cross-validation procedure.
A k-fold cross validation for the \(\alpha\)-k-NN regression for compositional response data is performed.
Michail Tsagris, Abdulaziz Alenazi and Connie Stewart (2020). The alpha-k-NN regression for compositional data. https://arxiv.org/pdf/2002.05137.pdf
# NOT RUN {
y <- as.matrix( iris[, 1:3] )
y <- y / rowSums(y)
x <- iris[, 4]
mod <- aknnreg.tune(y, x, a = c(0.4, 0.6), k = 2:4, nfolds = 5)
# }
Run the code above in your browser using DataLab